Skip to content

Conversation

Red-Portal
Copy link
Member

@Red-Portal Red-Portal commented Sep 15, 2025

This PR removes the use of the type ParamSpaceSGD, which provides a unifying implementation of VI algorithms that run SGD in parameter space. Instead, each parameter space SGD-based VI algorithm becomes its own AbstractVariationalAlgorithm, where the shared code implementing step is shared by dispatching over their Union.

This addresses #204

@Red-Portal
Copy link
Member Author

Hi @yebai , could you check to make sure that this is what you asked for? Personally, I feel the ParamSpaceSGD-based interface is much cleaner and intuitive, especially in terms of project structure. So I still insist we keep it as an internal implementation detail.

Red-Portal and others added 3 commits September 15, 2025 12:08
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
elseif alg isa KLMinScoreGradDescent
return KLMinScoreGradDescentState(prob, q_init, 0, grad_buf, opt_st, obj_st, avg_st)
else
nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw a warning or error message here instead of letting it fail silently?

Suggested change
nothing
nothing

Copy link
Member Author

@Red-Portal Red-Portal Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should never hit the else condition, so let me use InvalidStateException.

prob, re(params), iteration, grad_buf, opt_st, obj_st, avg_st
)
else
nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Suggested change
nothing
nothing

obj_st::ObjSt
avg_st::AvgSt
end
const ParamSpaceSGD = Union{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const ParamSpaceSGD = Union{
"""
This family of algorithms (`<:KLMinRepGradDescent`,`<:KLMinRepGradProxDescent`,`<:KLMinScoreGradDescent`) applies stochastic gradient descent (SGD) to the variational `objective` over the (Euclidean) space of variational parameters.
The trainable parameters in the variational approximation are expected to be extractable through `Optimisers.destructure`.
This requires the variational approximation to be marked as a functor through `Functors.@functor`.
"""
const ParamSpaceSGD = Union{

Copy link
Member

@yebai yebai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Red-Portal -- I left some comments above. In addition, let's simplify the folder structure a bit for clarity:

  1. move all files in paramsspacesgd to algorithms, eg, "algorithms/paramspacesgd/constructors.jl" to "algorithms/constructors.jl"
  2. keep each algorithm in its own file

Also, I'd suggest we consider renaming paramspacesgd.jl to interface.jl or something along the lines:

  • "algorithms/paramspacesgd/paramspacesgd.jl" to "algorithms/interface.jl"

@Red-Portal
Copy link
Member Author

Hi Hong, I planned to do the restructuring in a separate PR to keep things simple in this one. Though:

move all files in paramsspacesgd to algorithms, eg, "algorithms/paramspacesgd/constructors.jl" to "algorithms/constructors.jl"

"algorithms/paramspacesgd/paramspacesgd.jl" to "algorithms/interface.jl"

After the release of v0.5, we'll be adding algorithms that don't conform to the original ParamSpaceSGD formalism so I think these namings are not gonna withstand that change. In fact, part of the reason I kept everything under paramspacesgd/ was precisely for this reason.

@yebai
Copy link
Member

yebai commented Sep 15, 2025

It is okay to keep all algorithms under algorithms and remove the extra subfolder paramspacesgd:

  1. These categorisations are nonstandard, so they are not helping clarity.
  2. There aren't many algorithms, so it is okay to keep all of them under the same algorithms folder.

After the release of v0.5, we'll be adding algorithms that don't conform to the original ParamSpaceSGD formalism

You can add more algorithms to interface.jl, so long as these algorithms are clearly grouped in interface.jl.

@Red-Portal
Copy link
Member Author

After the release of v0.5, we'll be adding algorithms that don't conform to the original ParamSpaceSGD formalism

You can add more algorithms to interface.jl, so long as these algorithms are clearly grouped in interface.jl.

I am saying that these new algorithms can't be grouped in interface.jl since they will need their custom implementation of step. The current algorithms all go through the same step, which is why they can be grouped.

@yebai
Copy link
Member

yebai commented Sep 16, 2025

"grouping" refers to grouping interface code together for similar VI algorithms in the proposed interface.jl. It doesn't require creating a new union type if an algorithm is distinct from others. In these cases, the algorithm could be a singleton group.

@yebai yebai requested a review from sunxd3 September 16, 2025 07:13
@sunxd3
Copy link
Member

sunxd3 commented Sep 16, 2025

If I understand right, this PR flattens the existing AbstractVariationalAlgorithmParamSpaceSGD → (concrete algorithms) hierarchy. That intermediate layer exists today so that anything doing “SGD in parameter space” can share one abstraction.

Before we drop it, may I understand what concrete benefits the flattening delivers? In particular, are we planning to add other algorithm families alongside the current ParamSpaceSGD? If so, we then probably need to add the middle layer abstraction back.

At the moment, I haven't quite convince myself that the flattening of the type hierarchy is necessary.

@yebai
Copy link
Member

yebai commented Sep 16, 2025

I suggested the removal of the ParamSpaceSGD because it is not standard terminology in the variational inference literature. Adding this less well-known terminology adds mental overhead to understanding the code.

@Red-Portal
Copy link
Member Author

Red-Portal commented Sep 16, 2025

@yebai @sunxd3 Thanks for chiming in. Actually, I have a new idea. So I believe the main complaint at the moment is that the term ParamSpaceSGD is not intuitive as an abstraction. What if we fix that directly: by changing the name to something more obvious.

In a nutshell, the nice thing about the current ParamSpaceSGD interface is that you only need to define a gradient estimator (estimate_gradient) of a corresponding objective to form an algorithm. So let me make that more explicit in the name. Here are a few candidates:

  • ObjectiveSGD
  • ObjectiveDescentInducedAlgorithm
  • MinObjectiveSGD
  • MinObjectiveWithSGD
  • MinObjectiveBySGD
  • ObjectiveGradientEstimateDescent

or something along these lines? Would that resolve your concern?

@yebai
Copy link
Member

yebai commented Sep 16, 2025

I think I see your point. But, I am not sure that helps.

define a gradient estimator (estimate_gradient) of a corresponding objective to form an algorithm.

That probably includes every learning algorithm in ML.

@Red-Portal
Copy link
Member Author

define a gradient estimator (estimate_gradient) of a corresponding objective to form an algorithm.

That probably includes every learning algorithm in ML.

Yes, that is indeed almost true! But the point is that there are a couple of important algorithms that don't quite conform to this formalism, as they result in a custom update rule. They don't fall out of a gradient estimator, but modify the parameter update step too. So this is the reason I wish to allow for two different abstraction levels. But as you said, most algorithms only require defining a gradient estimator. So the lower-level interface helps unify the code for all those algorithms.

@yebai
Copy link
Member

yebai commented Sep 17, 2025

a couple of important algorithms that don't quite conform to this formalism, as they result in a custom update rule. They don't fall out of a gradient estimator, but modify the parameter update step too.

We are at risk of premature abstraction and introducing heuristic terminology here. It is better to work with concrete algorithms, and define a union type if sharing code is needed (eg, step) across algorithms.

There might be some insights we can learn by taking a unifying view of parameter space gradient descent VI, but that is a discussion we should have offline for a review paper.

@Red-Portal
Copy link
Member Author

We are at risk of premature abstraction and introducing heuristic terminology here. It is better to work with concrete algorithms, and define a union type if sharing code is needed (eg, step) across algorithms.

My main beef with using Unions here is the following:

  • This results in duplicating code, as can already be seen in the PR (the structs KLMinRepGradDescent, KLMinRepGradProxDescent, KLMinScoreGradDescent all have the same fields.)
  • To use the shared step function, a certain <:AbstractVariationalAlgorithm struct has to contain a certain list of fields, which is now implicit since we're not using a proper interface.
  • I am not sure whether I should document the use of the shared step function. But if I do document it, it's not going to be pretty since it assumes a whole lot of implicit things (as stated in the item above).
  • The structuring of the project will need a bit of discussion since things are a bit more complicated. (What should we call the file containing the shared step function? How should we structure the directories?) As mentioned in a previous comment, we can't just use generic names since I will be adding algorithms that don't use the shared step function in the following PRs. Also, people first looking at the code base will probably have to do some mental gymnastics since lots of things are made implicit but not explicit through some interface.

With that said, do you find the solution below still unsatisfying? At least I hope that this resolves your concern that the terminology is non-standard.

So let me make that more explicit in the name. Here are a few candidates:

  • ObjectiveSGD
  • ObjectiveDescentInducedAlgorithm
  • MinObjectiveSGD
  • MinObjectiveWithSGD
  • MinObjectiveBySGD
  • ObjectiveGradientEstimateDescent
    or something along these lines? Would that resolve your concern?

If you think we should still go with an implicit interface, then I'll follow for the sake of moving forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants